The Anjials of Statistics 

2011, Vol. 39, No. 6, 3211-3233 

DOI: 10.1214/11-AOS933 

@ Institute of Mathematical Statistics, 2011 



AN ASYMPTOTIC ERROR BOUND FOR TESTING MULTIPLE 
QUANTUM HYPOTHESES 

By Michael Nussbaum^ and Arleta Szkola^ 

Cornell University and Max Planck Institute 

We consider the problem of detecting the true quantum state 
among r possible ones, based of measurements performed on n copies 
of a finite-dimensional quantum system. A special case is the problem 
of discriminating between r probability measures on a finite sample 
space, using n i.i.d. observations. In this classical setting, it is known 
that the averaged error probability decreases exponentially with ex- 
ponent given by the worst case binary Chernoff bound between any 
possible pair of the r probability measures. Define analogously the 
multiple quantum Chernoff bound, considering all possible pairs of 
states. Recently, it has been shown that this asymptotic error bound 
is attainable in the case of r pure states, and that it is unimprovable 
in general. Here we extend the attainability result to a larger class of 
r-tuples of states which are possibly mixed, but pairwise linearly in- 
dependent. We also construct a quantum detector which universally 
attains the multiple quantum Chernoff bound up to a factor 1/3. 

1. Introduction. Consider a finite set S = {Pi, . . . , Pr} of probability dis- 
tributions on a sample space 0, and the problem of discriminating between 
them on the basis of observed i.i.d. data. It is well known that for the max- 
imum likelihood decision rule, the error probability (Bayesian for uniform 
prior) decreases exponentially, with a rate given by the worst case among 
the possible pairwise hypothesis testing problems. Indeed if S,cB{Pi, Pj) rep- 
resents the rate of exponential decay of the error probability for deciding 
between Pj and Pj, given by the classical Chernoff bound 



(cB{Pi,P,) = - log inf / {dP,f~\dP, 

0<s<l J 
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then the multiple Chernoff hound pertaining to the set S has been defined as 

(1.1) eCB(S) ■.= lUm{icB{Pi,PJ)■■P^,PJ € ^,Pi^Pj] 

(Sahkhov [28-30]). If 7r„ is the maximum hkelihood rule for sample size n, 
with values in {1, . . . then, under a uniform prior on S 

(1.2) - ilogPr(7r„ T^i) -;>^cb(S) as n oo 

n 

and since 7r„ is also Bayesian here, the quantity ^cb(S) is the best possible 
asymptotic error exponent for any decision rule under a uniform prior. 

On terminology. When large deviation type limits are written in logarith- 
mic form as in (1.2), then the right-hand side ^cb(S) is referred to as the 
rate of exponential decay or, in information theory, as the asymptotic error 
exponent., to be maximized by decision rules. Throughout the paper, we ad- 
here to this formulation as a convenient equivalent to minimizing asymptotic 
error. 

We consider here the analogous problem in a quantum statistical setting, 
where S = {pi, . . . , p^} is a set of density operators on the finite-dimensional 
complex Hilbert space C^. Recall that by definition a density operator p, 
describing the state of a physical system, is a complex, self-adjoint, positive 
semidefinite matrix satisfying the normalization condition tr[p] = 1. If all 
operators pj € S commute, then the corresponding matrix representations 
are jointly diagonizable, and the problem becomes one of discriminating be- 
tween the associated finite probability distributions appearing on the matrix 
diagonal. 

The starting point for our investigation is the recent extension of the 
Chernoff binary testing bound to the quantum setting [2, 3, 22]. In full 
analogy to the classical case, the quantum Chernoff bound specifies the 
asymptotic error in the decision problem between pj and pj, based on a rule 
using the outcomes of measurements performed on n copies of the basic 
quantum system. 

The case of multiple hypotheses (r > 2) represented by quantum states 
has received some interest in the literature over the past three decades; cf. 
[6, 12, 14, 15, 25, 26, 34] and overviews in [7, 9, 11]. While in the binary 
case (r = 2) the optimal quantum test is described explicitly by the Holevo- 
Helstrom projections, in the case r > 2 only an implicit description in terms 
of an extremal problem is available (Holevo [14], Yuen, Kennedy, Lax [34]). 
Parthasarathy [26] has dubbed the quantum Bayes rule "quantum maximum 
likelihood," in view of the fact that in the classical case, for a finite number of 
hypotheses, the Bayes rule for uniform prior is indeed maximum likelihood. 

Numerous new contributions to multiple quantum hypothesis testing ap- 
peared in the very recent past, for example, [1, 4, 17-19, 21, 27, 32, 33]. The 
main focus has been on characterizing the Bayes rule of [14, 34] and finding 
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approximations to it. We focus here on the asymptotics of the error probabil- 
ity based on measurements performed on n of copies of the basic quantum 
system. The true state is thus described by the nth tensor power pf^ of 
one of the original density operators pi^Ti. Parthasarathy [26] established 
consistency of the Bayes rule and also an exponential rate of decay of the 
error probability, without specifying the error exponent. The first step to- 
ward finding the optimal asymptotic error, for which a similar structure as 
in the classical case (1.2) was conjectured, was made in [24]. It was shown 
that if all pi are pure states [rank(/9j) = 1], then the optimal asymptotic 
error is given by ^qcbI^^), defined as the worst case error for quantum dis- 
crimination between any pair of distinct states involved. Thus, the situation 
is indeed analogous to the classical case (1.2), and the quantity ,^qcb(5^) 
describing the asymptotics of the error probability should be termed the 
multiple quantum Chernoff bound. 

The fact that ^qcb(S) is valid as a lower error bound is relatively straight- 
forward to prove; for a precise statement of the result from [24]; cf. Theo- 
rem 1. Attainability for pure states has been shown in [24] by construct- 
ing a measurement based on a Gram-Schmidt orthonormalization of the r 
unit vectors representing the pj € S. It should be mentioned that earlier 
Holevo [16] showed such a measurement to be an approximation to the 
Bayes rule. In [23], it was shown that without any restriction on the nature 
of the states, an asymptotic error ,^qcb(S) is achievable up to a factor which 
is between 2/r(r — 1) and 1, for r being the number of hypotheses. 

In the present paper, we develop a new decision rule generalizing two 
known asymptotically optimal ones, in the following sense: if all states com- 
mute, the method reduces to classical maximum likelihood (as does the 
Bayes rule of [14, 34]). If all states are pure, then it coincides with the 
orthonormalization algorithm of [24]. We establish that this rule attains 
asymptotic error ^qcb(5^) for a class of r-tuples of states which fulfill Con- 
dition (LI) below. The condition allows for mixed states but excludes faithful 
ones (full rank density matrices). We then show that a modified version of 
our rule is near optimal, in the sense that it attains at least |^qcb(S) uni- 
versally. 

The outline of our paper is as follows. In Section 2, we introduce notation, 
specify the mathematical framework, and state precisely our main results in 
Theorems 2 and 3. Some further discussion of the quantum Bayes rule, of 
results in statistics resembling the multiple Chernoff bound and other topics 
follows at the end of that section. In Section 3, our new quantum decision 
rule is developed, along with Lemma 1 providing a basic error bound. Sec- 
tion 4 treats the case of pairwise linearly independent states [Condition (LI) 
and Theorem 2] . Section 5 shows how our decision rule reduces to maximum 
likelihood in the commuting case, such that Lemma 1 reproduces the multi- 
ple Chernoff bound of [28, 29]. Section 6 concerns the general attainability 
of the near optimal error bound (Theorem 3). 
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2. Notation and preliminaries. We will describe here the formalism for 
the simplest possible nonclassical setup of discrimination between several 
quantum hypotheses. A density matrix p is a complex, self-adjoint, posi- 
tive, d X d matrix satisfying the normalization condition tr[p] = 1, where 
tr[-] is the trace operation. Here, "positive" means nonnegative definite. We 
identify a d x d density matrix with a quantum state on C^; we also use 
"matrix" and "operator" interchangeably. The r hypotheses are described 
by states Hi: p = pi, i = 1, . . . ,r. Physically discriminating between these 
states corresponds to performing a measurement on the quantum system. 
Mathematically a quantum decision rule with r possible outcomes is a set of 
complex self-adjoint positive matrices dx d matrices E = {Ei ,Er} satis- 
fying Ei = l where 1 is the unit matrix. The r-tuple E is often called 
a POVM (positive operator valued measure); we will refer to it as a quan- 
tum multiple test or a quantum detector. In the special case where all Ei are 
projections, the r-tuple E is called a PVM (projection valued measure) or 
von Neumann measurement. The individual success probability, that is, the 
probability to accept hypothesis Hi when pi is the true state, is given by 

Succi(£;) ■.= tr[piEi]. 

The corresponding individual error probability, that is, the probability of 
rejecting the true state pi according to the decision rule, is 

Erri(E) = 1 - Succi(^) = tr[pi(l - Ei)] 

r 

= tr[p,E,]. 
The total (averaged) error probability is then 

T T 

Err(^) := ^ J^Erri(S) = ^ J^tr[pi(l - E,)]. 

i=l i=l 

The above describes the basic setup where the finite dimension d is arbitrary 
and the hypotheses are equiprobable. We consider the quantum analog of 
having n i.i.d. observations. For this, the r hypotheses are assumed to be /ff 
i = 1, . . . , r, where p®^ is the n-fold tensor product of p with itself (a d^ x d^ 
matrix). The detectors E = {Ei, . . . ,Ej.} now operate on the states pf"", 
that is, the dimension of the components Ei is x d", but Ei need not 
have tensor product structure. The corresponding total error probability of 
a detector E is now 

Err„(i^) = l-^itr[pf"i?,]. 

For the case of two hypotheses r = 2, the Bayes test for each n G N is known 
to be the Holevo-Helstrom hypothesis test. It is given by the detector EJ = 
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{l-n;,n;} where 

where suppa is the projection onto the space spanned by the columns 
of a and a+ denotes the positive part of a self-adjoint operator a. Thus, 
if a = XiSi is the spectral decomposition using projections Si, then a+ := 
Sa,>o '^i^i ^^'^ suppa+ = J2xi>o ^i- "^^^ Bayes test is unique up to a possi- 
ble reassignment of the projections Si corresponding to zero eigenvalues of 
a = pf^ — pf"". For r > 2, the Bayes detector has been described in [14, 34]. 
Explicit expressions for its r components are not known in general; for the 
convenience of the reader, we present the available implicit description below 
at the end of this section. 

If for a sequence of detectors E'(„) the limit lim„_^oo — ^ logErr„(i?(„)) 
exists, we refer to it as the (asymptotic) error exponent. For two density 
matrices pi and p2, the quantum Chernoff hound is defined by 

(2.1) eQCB(pi,P2) :=-log inf tr[/>J-Vi]. 

0<s<l 

The basic properties of '^qcb(pii P2) have been discussed in [3]. Some distan- 
ce-like properties have been noted by Calsamiglia et al. [8]. For the binary 
discrimination problem, it is known that the Holevo-Helstrom (Bayes) de- 
tector EJ . satisfies 

(n) 

lim - - log Err„ [EJ^^ ) = ^q^b (pi , P2 ) , 

n— >oo n vj ^ 

thus specifying ^QCB(yOi5 y02) as the optimal error exponent (cf. [2, 3, 22]), 
and providing the quantum analog of the classical Chernoff bound, that 
is, (1.2) for r = 2. 

For a set S = {pi, . . . ,pr} of density operators on C^, where r > 2, we 
have introduced in [24] the multiple quantum Chernoff hound ^qcb(S) 

(2.2) Cqcb(S) := min{CQCB(p^,/9i) : 1 < i < J < r}. 

If all the states are jointy diagonizable (commuting), then (2.2) reduces to 
the classical multiple Chernoff bound (1.1), as it was defined in [28, 30] for 
hypotheses represented by probability distributions. Taking the minimum 
over different pairs of hypotheses corresponds to the worst case in any of 
the associated binary hypothesis testing problems. The following well-known 
result shows that ^qcb(5^) as a rate exponent cannot be exceeded (cf. [24], 
Theorem 1). 

Theorem 1. Let S = be a finite set of hypothetic states 

on C^. Then for any sequence {E(^n)}nef>i of quantum detectors relative to S*^"" 
respectively, one has 

(2.3) limsup-i log Err„ (£)(„)) <^qcb(S). 

n— >oo ri 
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The above theorem has been extended in [23] to the case of quantum 
hypotheses which correspond to identically distributed but not necessary 
independent observations. The corresponding upper bound in (2.3) is then 
replaced by a mean generalized ChernofF distance, as introduced in [13] for 
a stationary observation scheme in binary case. In [23], it was also shown, 
again in a wider model corresponding to a class of correlated observations, 
that quantum detectors with an exponential decay of Err„(£'(„)) can be 
constructed, with error exponent (/).^qcb(5^) where 2/r(r — 1) < (/> < 1. The 
method used in [23] yields a factor (p which may be close to one for special 
ensembles of states, but the guaranteed factor 2/r(r — 1) decreases with the 
number of hypotheses. 

The following two theorems represent our main results. The support supp(/?) 
of a state p is the subspace of spanned by its columns. Consider: 

Condition (LI). supp(/jj) n supp(/3j) = {0} for all i ^j. 

The condition is equivalent to requiring that pi and pj are linearly in- 
dependent, in the sense that for any two bases of supp(/>j) and supp(pj), 
the union set of vectors is linearly independent. This is obviously fulfilled 
for a set S of r distinct pure states, but the condition allows for mixed 
states if d > 2. Indeed, Condition (LI) restricts the dimension of the sup- 
ports supp(/3j) according to the inequality supp(/9j) + supp(/>j) < d that is 
valid for all j. However, as long as none of the density matrices is of full 
rank, that is, rank equal to d, no constraints on the number r of distinct 
hypothetic states are imposed by Condition (LI). 

Theorem 2. Let T, be a finite set of states on fulfilling Condi- 
tion (LI). Then there exists a sequence {E(^n)}nen of quantum detectors 
relative to S®", respectively, such that 

lim --logErr„(£^(„)) =^qcb(S). 

n— i>oo n \ I 

Due to the following theorem in the i.i.d. situation — as considered in 
the present paper- an error exponent of |i^qcb(5^) can always be achieved, 
independently of both the (finite) number r of hypotheses and the special 
configuration of the corresponding states. 

Theorem 3. Let T, be a finite set of states on C^. Then there exists 
a sequence {S(„)}neN of quantum detectors relative to S*^", respectively, 
such that 

lim inf-- log Err„ (£;(„)) > ^Cqcb(S). 

n->-oo n 3 

Our results are constructive in the sense that we provide an explicitly 
computable quantum detector attaining the bounds. This detector reduces 
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to classical maximum likelihood in the commuting case (cf. Section 5), as 
does the Bayes rule, and hence attains the optimal rate exponent (1.2); 
cf. [28]. Thus our method can be seen as an alternative to the quantum 
Bayes rule. The above error bound is a fortiori true for the latter, and 
also for computable approximations to it having at most 2 times its error 
probability (Tyson [32, 33]). Our results along with those of [23] allow the 
conjecture that in Theorem 3, the factor 1/3 can be removed; cf. also the 
discussion point 5 below. 

To further discuss the context of the main results, we note the following 
points. 

1. The quantum Bayes rule (Holevo [14], Yuen et al. [34]; cf. also Parthasa- 
rathy [25, 26] and Hayashi [11]). Let S = {pi, . . . ,pr} be such that all pi are 
distinct states on C^. Let be £ the set of all pertaining detectors E, that is, 
E = {El, . . . , Er} where Et are positive self-adjoint dx d with Yll=i Ei = l. 
Define 

r 

(2.4) jjL = maxSucc(£') := maxN^ tr[/3j£'j]. 

i=l 

Then there exists a unique operator M on satisfying 

tr[M]=/i, M>pu i = l,...,r. 

Maximizers E* = {E^, ...,£"*}€ £^ of (2.4) exist by compactness and conti- 
nuity, and any such maximizer (a Bayes rule) satisfies 

r r 

M = Y,PrE* = Y.E*p,, 

i=l i=l 

(2.5) 

(M - pi)E* = E;{M -p^)=0, i = 1, . . . , r. 

A proof using only elementary calculus can be found in [25], Theorem 3.1. 
If r = 2, then the Holevo-Helstrom rule {1 — 11,11} for IT = supp(/92 — Pi)+ 
is a Bayes rule. If all states pi, i = 1, . . . ,r commute, hence pi can be repre- 
sented as diagonal matrix with diagonal elements pij, j = 1, . . . ,d, then M 
is a diagonal matrix with diagonal elements rrij = maxj=i^...^rPij- Then any 
Bayes rule E* with diagonal matrices E* is maximum likelihood, assigning 
or 1 to the diagonals of E* , such that a 1 is at {j,j) only if pij = mj. 

2. Pretty good measurement. Let S = {pi, . . . , p^} be a set of pairwise 
distinct density operators with respective a priori probabilities pi . Define the 
positive semi-definite operator p = J2l=iPiPi- ^ possible quantum detector 
relative to S is of the form 

pPGM ._ -1/2 -1/2 

■— P PtPiP 5 i — L,...,r. 

(The inverse is understood to be taken on the support of p only.) It represents 
the widely investigated POVMs called pretty good measurements (PGM). 
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These are known to be a good approximation of the quantum Bayes rule: if S 
is a set of pure states, then the averaged success probabihty Succ(PGM) = 
^[^^Pi Succj(PGM) is lower bounded by a result of Barnum and Knill [5], 

Succ(PGM)> [max^piSucci(S) 

\ i=l 

where £ denotes the set of quantum detectors relative to S. For further 
bounds on Succ(PGM) refering also to the general case of mixed states 
see [21] and references therein. To the best of our knowledge, in the literature, 
the PGM has not been successfully used to study the optimal asymptotic 
error exponent. 

3. Classical results resembling the multiple ChernoJJ bound. Let T, be 
a statistical experiment having finite parameter space {^i,...,^^}, and 

be the associated product experiment corresponding to i.i.d. observations. 
Torgersen [31] considered the deficiency (in the Le Cam sense) 

of with respect to the fully informative experiment Sq. Here may be 
identified, up to equivalence, with the set of r point masses concentrated 
on 9i, . . . ,6r- It was shown ([31], Theorem 4.2) that 

--log5(S",S«)^ecB(5]) asn^oo 
n 

with ^cb(5^) defined in (1.1). Krob and von Weizsacker [20] considered the 
Shannon capacity C(S") of construed as a communication channel, and 
showed that C(S"') approaches its upper bound logr exponentially quickly, 
with rate exponent ^cb(5^): 

--log(logr-C7(S"))^ecB(S) asn^oo. 
n 

4. Linearly independent states. A stronger condition than Condition (LI) 
would be that all states {pi, . . . ,pr} are linearly independent (in the sense 
that for any selected r bases of the spaces supp(/9j),i = 1, . . . ,r, the union 
set of vectors is linearly independent.) The paper [10] gives examples of such 
ensembles of states, and shows that under this stronger condition, the Bayes 
detector E = {Ei, . . . , Er} consists of projections Ei (is a von Neumann 
measurement or PVM). Lemma 2 implies that our pairwise Condition (LI) 
on S implies the stronger one for S®", that is, the states pf,...,pf are 
linearly independent for sufficiently large n. 

5. Other special ensembles. It can be shown that there are other situa- 
tions besides Condition (LI) where the error exponent .^qcb(S) is attainable 
exactly. One condition, which does not impose any rank restrictions on the 
states and thus allows for full rank density matrices pi, is as follows. For 
a set S = {pi, . . . , pr} of density operators where r > 2, let be the set 
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where a pair pi, pj, is removed, that is, j)_ = S \ {pi,Pj} for 1 < i < j < r. 
Assume there is a pair such that 

CqCb(S) < i^QCB(S(ij)_). 

This condition can replace Condition (LI) in the statement of Theorem 2, 
that is, the multiple quantum Chernoff bound is then attainable. The proof, 
not to be presented here, consists in a combination of the sample splitting 
method of [23] with Theorem 3. This further supports the conjecture that 
the result of Theorem 3 is not final and the factor 1 /3 there may be removed. 

Throughout the paper, we use the notation j €{!,..., d} and j £ 
interchangeably. 

3. The detection algorithm. In this section, we construct a sequence 
n € N, of quantum detectors for S®". The construction does not rely on the 
existence of asymptotically optimal quantum tests for the binary case. It is 
rather a modification of a construction used in [24] which yields asymptot- 
ically optimal quantum tests for a set of pure states. At the same time, it 
represents a quantum extension of the classical ML method, different from 
the Bayes rule described in (2.5). 

Consider again the classical case where a set S = {Pi, . . . , Pr} of probabil- 
ity distributions is given on a finite sample space Q with cardinality d. An ob- 
vious algorithmic description of a ML decision rule 99 : ^ {1, . . . , r} is as fol- 
lows. For each w € fi, find a maximal element in {Pi{uj)}l^i, say Pi-*{uj), and 
then decide (p{u}) = i* . Alternatively, one may successively find the largest 
probabilities among all Pi{uj), identify which Pi and which to they are from, 
and assign a corresponding decision on this cj. This iterative approach can 
be expressed in a simple algorithm in pseudocode as follows. 

Algorithm 1 (Classical ML rule). 

Initialize. Let Hq = {Pi{u}),i = 1, . . . ,r,u; G Q} be the r x d-matrix of all 
probabilities. 
For s = 1 to d: 

(i) In n^^i find a maximal entry, Pi*{uj*) say. Set LOg = and decide 
ip{ujs) =i*. 

(ii) In n<j_i, all Pi{ujs),i = ■ ■ ■ ,r are replaced by —1; the resulting rxd- 
matrix is 11^. 

After s = d steps, the matrix 11^ has entries —1 only (a value serving 
as an indicator, chosen to be smaller than any probability). We also have 
enumerated the elements of as wi, . . . , Ud', on each of these, a decision (p{uJs) 
has been made, which is ML by construction. 

In the quantum case, there is no initial sample space ii; it only appears 
after defining a measurement, which in our context can be taken to be an 
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orthonormal basis {es}g=i of C^. After this basis is fixed, the sample space 
= {uj s}'^=i can be identified with the basis itself, or more precisely with 
the set of pertaining projectors, such that each ojs = \es){es\, and a classical 
nonrandomized decision rule ip '.n ^ {1, . . . ,r} has to be found. Then the 
quantum decision rule E = {Ei, . . . ,Er} is given by the PVM 

(3.1) Ei= ^ \es){es\, i = l,...,r. 

The algorithm we will describe constructs the basis elements Cj and the 
pertaining decision (/?(•) iteratively, combining the ML principle underlying 
Algorithm 1 with a Gram-Schmidt orthogonalization. 

For each 1 < i < r let 

d 

(3.2) Pi = y^^Xj j\vij){vij\ 

i=i 

be a spectral decomposition of the density matrix pi, where Xij , j = 1, . . . ,d, 
are the eigenvalues of pi appearing with their multiplicity, in arbitrary or- 
der, and \vij) are the corresponding normalized eigenvectors in C^. Here {vij\ 
denotes the dual vector such that in this notation \vij){vij \ describes an or- 
thogonal projector onto the one-dimensional subspace of spanned by \vij). 
We stress that zero eigenvalues are included with their multiplicity since d 
in (3.2) is the dimension of pi. 

Algorithm 2 (A quantum decision rule). 

Initialize. Let Aq = {Xij,i = 1, . . . , r, j = 1, . . . , d} be the r x d-matrix of 
all eigenvalues. Let cq = be the zero vector in C^. 
For s = 1 to d: 

(i) In A<j_i find a maximal entry, Xi*j* say. Set to be a unit vector 
such that 

(3.3) eg £span{ei,. . . ,es-i,Vi*j*), _L span(ei, . . . , Cs-i) 

and decide ip{\es){es\) =i*. 

(ii) In A<j_i, all Xij such that Vij G span(ei, . . . ,6^) are replaced by — 1; 
the resulting r x d-matrix is A^. 

Again, after s = d steps, the matrix A^ has entries —1 only. We also 
have constructed an orthonormal basis ei,...,ed and on each of these, an 
associated decision ip{\es) {es\) ■ The crucial step (3.3) is recognized to define 
a Gram-Schmidt orthogonalization process. The quantum detector now is 
given by the PVM (3.1). 

To bound the error probability of this detector, we need to introduce some 
further notation. In each step s of Algorithm 2, in part (i) we have selected 
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an index pair where Aj*j* is a maximal entry of the matrix As_i; 

set (i(s),j(s)) = The sequence of vectors {'yi(s),j(s)}f=i is hnearly 

independent by construction. For each s € [l,d] define a dx s matrix Vg 

(3-4) Vs := • • ■ ,Vi(s),j{s))i 

that is, the columns of Vs are the vectors t'j(fc)j(fc)5 k G [l,s]. We refer to the 
s X s-matrix 

(3.5) r, := V:Vs 

as a Gram matrix of {vi(^k),j{k)}k=i- each s G the matrix Eg is 
nonsingular and the matrix 

Ps := Vs{V:Vsr^V: = VsTJ^V; 

represents an orthogonal projection onto span(t;j(i) . . . , Vi(^s),j(s))i s-di- 

mensional subspace of C^. Additionally, we set Po = and define for s £ [1, d] 

(3.6) P(^) :=P,-P,_i. 

Observe that the P^**^ represent one-dimensional orthogonal projectors, which 
are mutually orthogonal, such that P^^^ = \es){es\ for the unit vectors de- 
fined in (3.3). The latter can betaken to be = \\P^^^Vi(^s),j{s)\\~^P^^^''Ji(^s),j{s) 
(or a sign changed version). 

Furthermore, define an index N as 

(3.7) N = max{s e [l,d] : Xi^s),j{s) > 0}. 

It can be seen from the proof of Lemma 1 below that if < d, then A^ can 
serve as an early stopping index for Algorithm 2, in the following sense: the 
obtained set of orthonormal vectors {es}^^i can be completed to a basis 
of in an arbitrary way and the decisions ip{es), s > N, can be taken 
arbitrarily. This is related to the fact that for all further steps s > N, the 
remaining eigenvalues Ajj listed in the matrix A^ are 0; in Algorithm 1 this 
corresponds to the case that there exist u G^l which are outside the support 
of all Pi. 

We use the notation Amin(') for the minimal eigenvalue of a self-adjoint 
matrix. 

Lemma 1. Let E = be an arbitrary set of density matrices on C^. 

Then the detector E = {EiYi^i constructed in Algorithm 2 fulfills 

(3.8) Err(i?) < A-i„(r;v)r-i tr[p^>^^], 

. /.«60,1 

where is the Gram matrix according to (3.5) for index s = N defined 
in (3.7). 

Proof. Define J to be the subset of [l,r] x consisting of all pairs 
{i{s)^j{s)), s E and Ji := {j E [l,d]:{i,j) E J}. For given i E [l,r]. 
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consider the corresponding individual success probability of the detector 
defined by (3.1): 

d 

(3.9) Succi{E) =tT[piEi] = '^Xijtv[\vij){vij\Ei] > ^ Xijtv[\vij){vij\Ei], 

where the right-hand side is set if the set Jj is empty. For any j ^ Ji, 
let s{i,j) be the unique index s G [l,(i] such that = {i{s),j{s)). If Ji is 
nonempty, then 

with defined in (3.6), hence Ei > for ah j e Ji, in the sense of 

the ordering for self-adjoint matrices. This implies 

tr[\vij){vij\Ei] > {vij\P^''^''^^^\vij) = {vij\Ps(ij)\vij) - {vij\Ps[ij)_i\vij). 

Recall that the matrices Pg are constructed as orthogonal projectors onto 
span(t;j(;^) j(;^), . . . ,^^4(5)^(5)), and since for j G Jj and s = s{i,j) we have Vij = 
Vi(^s),j{s)i it follows that for s = s{i,j) 

{'Vij\Ps{i,j)\Vij) = {Vi{s),j{s)\Ps\Vi{s),j{s)) = 1- 

Consequently, 

SuCCi(i^) > ^ \j{v,j\P^-<''^^^\Vij) = A^i - Yl ^vi^v\Psii,j)-l\Vij). 

For the individual error probability under state pi this implies, setting Jf := 
[1, d]\.h, 

Erri(E) = 1 - Succi(£') < ^ \j{vij\Ps{i,j)-i\vij) + ^ hj 

jeJi i^Jt 

(3.10) 

— Si + S2, 

say. 

Bounding the term Si. Consider only those terms in 

Si = 'Y 

where Xij > 0. Since for j € Jj we have Xij = Xi(^s),j{s) for some s = s{i,j) G 
[l,d], the assumption Xij > implies s{i,j) < N. Recall that Ps = VsTj^V*, 
s = 1, . . . ,d, and that each F^-i is a principal submatrix of F^. As a conse- 
quence, Amin(rs) > Amin(FAr), s G [1,N], and for j G Ji, if not s{ij) = 1, 

(3.11) 
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where Amin(rAr) > by construction. Formally setting Vq = S C^, the above 
inequality holds also if s{i,j) = 1. One obtains the upper bound 

•51 = ^ \j{Vij\Ps{iJ)-l\Vij) 

(3.12) < \-^^{Tm) 

s(«j)-l 

= KLi^N)Y\j ik(fc),i(fc)i%-)i^- 

j&Ji k=l 

The identity above is based on the fact that the columns of V^(jj)_i are given 
by the vectors k G [l,s{i,j) — 1]. Note that in (3.12), for every pair 

of vectors occurring in (fi(fc),j(fc)|%) the corresponding eigenvalues satisfy 
\{k),j{k) ^ -^ij by construction. This implies 

(3-13) Xij < X]j ''Xt{k),j{k) 

for every s G [0,1]. Recall that every eigenvalue Xi(^k),j{k) pertains to a sta- 
te we may assume i{k) ^ i, since otherwise necessarily j{k) and 
thus {vi(^k),j[k)\vi{k)j) = 0- Setting now m = i{k) and assuming m^i, we will 
apply inequality (3.13) for an exponent s which is allowed to depend on i 
and m. Denote by s{i,m) = s{m,i) S [0,1] the exponent associated to the 
pair of indices {i,m) G [l,r]^. Observe that for any subset Dm C [l,d] 

^rn,j' \i^m,j'\Vij)\ 

(3.14) 

d 

^ •, 1— s(i,m) , s(i,m) I ; i i|2 

i6Jii'=i 

where on the right-hand side of the inequality we are just adding positive 
reals. It now follows from (3.12), (3.13) and (3.14) that 

j6Ji l<m<r,m7^i j'=l 

Bounding the term 82- We have 

S2=Y = E (%■!%■) • 

Consider only those terms where Xij > 0. By definition of J?, there exists 
s£[l,d\ such that Gspan(7;j(i)j(i),...,?;i(s)j(s)). Then Xii^k),j{k) > >^ij for 
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k € [1, s], hence Xi(s),j{s) > and consequently s < N. We also have {vij\vij) = 
{vij\Ps\vij) , SO the same reasoning as for Si leads to 

(3.16) S2<A;„L(r^)E E EAr'"^A:S7)K.^,,|..,)p. 

j&Jf l<m<r,'myti j'=l 

Putting together (3.15) and (3.16), we obtain 

Err.(i?) < A-L(r^)E E E aI^^^^'-^^A^S^^K.^,,!..,)^. 

j=l l<m<r,m^i j'=l 

Since s{i,m), m^i, are arbitrary in [0,1], we obtain 

Err,(i?) < A-ijr^) E ^^^^MpI'^pU- 

l<m<r,m7=« 

By averaging over i E we obtain (3.8). □ 

4. Pairwise linearly independent states. The main difficulty for utiliz- 
ing Lemma 1 for an asymptotic error bound is the control of the mini- 
mal eigenvalue of the Gram matrix T^. Imposing Condition (LI) on the 
set S = {pi, . . . ,pr} is one way to achieve that control, resulting in Theo- 
rem 2. Observe that this condition is equivalent to requiring that for each 
pair Pi,Pj, iy^j, the joint set of eigenvectors pertaining to a nonzero eigen- 
value is linearly independent. Lemma 2 below implies in this case: the Gram 
matrix Fat associated to the tensor product set S®" = {pf^, ■ . ■ ,pf^} has 
minimal eigenvalue bounded away from zero as n ^ oo. 

For each of the original pi, let di := rank(/?j) the number of nonzero eigen- 
values. Condition (LI) implies that for any i ^ j we have di + dj < d, and 
since di > 1 this implies that all di < d. In this case rank(yO?'") = < d". 
Let Vn be the set of eigenvectors of pf pf"- pertaining to a nonzero 
eigenvalue; more precisely, if we assume spectral representations 

rank(pf") 

with unit vectors Vij and eigenvalues Xij > 0, then Vn is the double array 

Vn = {vij,j£[l,d^],iG[l,r]} 
so that #V„ = D„:=EI=i^?- 

Lemma 2. Let S = {pi, . . . ,pr} be a set of density matrices in C^, ful- 
filling Condition (LI). Let Vn be the set of eigenvectors defined above and 

o 

let F„ its Dn X Dn Gram matrix. Then 

o 

(4.1) Amin(r„) = 1 + o(l) asn-s-oo. 
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Proof. We will first argue for the generic case n = 1, and subsequently 
impose the tensor product structure on the pi. As above, let {vij}'^^^^ be the 
eigenvectors of pi pertaining to a nonzero eigenvalue. Define a dx di matrix 

(4.2) Ui:={uii,...,Uid,), 

that is, the columns of Ui are the vectors Uij, j G Furthermore, define 

a d X D matrix (where D = "^l^i di) 

U:={Ui\---\Ur) 

made up of submatrices Ui. Now, for n > 1 replace the matrices Ui in (4.2) 
by their nth tensor powers U^"'. Then for n > 1 the d"' x blocks Uf'"' 
correspond to eigenvectors of pf"^ , and U is now of dimension x Dn where 

o 

Dn = Yll=i df- For the Dn x Dn Gram matrix r„ := U*U we show (4.1). 

We will again begin with the case n = 1 and develop a representation of U 
which takes account of its block structure in terms of U*Uj. To this end, for 
i G [l,r] define di x D matrices 

Ei = {Od,xdi I • • • |OdjXd,_i \OdiXd,+ i I • • • |0<iiXdr)' 

where we denote a k x I matrix of O's by ^kxi and the /c-dimensional unit 
matrix by 1^. Then it is easily seen that U = X]i=i UiEi and consequently 

r 

(4.3) f 1 = U*U = ^ E*U*UjEj. 
Here U*Ui = 1^., i G [l,r], so that 

r 
i=l 

We define 

(4.4) A:=fi-lz? 

o 

and write Ti = Id + A. Moreover, for j < i we define 

(4.5) Aij = E* U* UjEj + E* U* UEi . 

Clearly Aij is Hermitian, and by construction A = X^[=2 X]}=i • Now, 

with ||o|| = Amax(o^) being the operator norm of a Hermitian matrix a, we 
have 

o 

Amin(ri) = min {v\l£, + A\v) = 1 + min (wjA|z;) 

> 1 - iiAii > 1 - E E 11^^. II = 1 - E E ^m/fx(A|), 

i=2 j=l i=2 j=l 

where the second inequality is by the triangle inequality for the operator 
norm. 
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For the case n > 1, replacing the matrices Ui in (4.2) by their nth ten- 

sor powers Uf'"' leads to a representation of r.„ analogous to (4.3). Here 
the matrices Ei have to be replaced by defined analogously to Ei 

with di replaced by , i € Furthermore, we define A„ and Ajj^„ anal- 
ogously to (4.4) and (4.5) with Ui, Ei replaced by Uf"' and Ei^n- In order 

o 

to prove (4.1) we use the analog of (4.6) holding for r„ and A„, which is 
A^i„(f„)>l-^|^AV2,(A|,J. 

i=2 j=l 

It now suffices to show that for all i G [2, r], j G [1, i — 1] 
(4.7) A^/2^(A2.J^0 asn^oo. 

Clearly, we have 

\j,n = ElniU* Uj)^"' Ej^n + E*^^{U*Ui)®^Ei^n 

and by a computation, since Ei^nE*^ = Id^ and Ej^nE*^ = Od'^xd^ for j <i, 
Af,> = El^{U:UjU*Uir^E,^n + El^{U;UiU:Ujr^E,,n. 

The two hermitian matrices composing Afj ^ are orthogonal, and their nonze- 
ro eigenvalues are those of {U*UjU*Ui)'^'^ and {U*UiU*Uj)'^'^ , respectively. 
Hence, 

A„,ax(A2. „) = max{A^ax(C/i*C/jf/;^7i)®", Xrn.AU*U^U*Uj)'^^} 

(4.8) 

= max{Xl,^{U*U,U*Ui),X-^,,{U*U,U*Uj)}. 

Let Pi = UiUl be the projection operator onto the space supp{pi) = span([/j). 
Note that U^PjUi and PiPjPi have the same set of nonzero eigenvalues, 
hence by Lemma 3 below and Condition (LI) we have Xma.xiU* PjUi) < 1 
and \raax{U* PiUj) < 1. It follows 

>^^a.iU*PjUi)^0 asn^oo, 

K,i,KiU*PiUj) ^0 as n ^ oo, 

hence by (4.8) Amax(A|^- „) 0. Thus, (4.7) is established. □ 

Lemma 3. Let Co,Ci be linear subspaces of and Po,Pi be the corre- 
sponding projection operators. Then CqH Ci = {0} if and only if 

Amax(-Po-Pi-Po) < 1- 

Proof. It is obvious that always Amax(-Po-Pi-Po) < 1) so it suffices to 
prove that CqCi Ci ^ {0} is equivalent to Amax(-Po-Pi-Po) = 1- Assume there 
exists X G CqH Ci, \\x\\ > 0, then PiX = x, i = 0,1 and hence PqPiPqx = x so 
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that Amax(-Po-Pi-Po) = 1- For the other direction, assume 
(4.9) XmMPiPo) = 1. 

Then there exist vq gC^, ||t;o|| = 1 such that {vq\PqPiPq\vq) = 1. Here ||Po'^o|| ^ 
1 by the properties of projections. Assume UPo'^oll < 1- Then for uq = PqVq 
we have 

{vo\PoPiPo\vo) = {uo\Pi\uo)<l, 

which contradicts the assumption (4.9). Hence, we must have ||Po^o|| = 1 
and hence vq G Cq and PqVq = vq. Then 

l = {vo\PoPiPo\vo) = {vo\Pi\vo), 

which imphes vq G Ci by an analogous reasoning. Hence, vq G CqD L\ where 
lluoll = 1, hence /:on £17^ {0}. □ 

Proof of Theorem 2. We utihze the detector constructed in Algo- 
rithm 2, applied to the tensor product case S = S®"; call this detector E^"^^ . 
Lemma 2 implies that the set Vn is a linearly independent set for sufficiently 
large n. As a consequence, when Lemma 1 is applied to the tensor product 
set S®" = {/of", . . . the matrix T^r occurring there equals r„ up to 

o 

a rearrangement and Amin(rAr) = Aniin(rn). We find from (3.8) that 
Err(ii;(")) < A-Jjf„)r-i J] inf tr[(pf")i-^(/,f )1 

s£ 0,1 

l<i j<r,jr^i 

= r-Hl + o(l)) E ( inf tr[p^Vj]) ■ 

Recall the definition (2.1) of the pairwise quantum Chernoff bound '^QCB(Pi) 
Pj); then 

(4.11) Err(ii;W)<r-i(l + o(l)) ^ exp(-neQCB(p*, P,))- 

Taking log of both sides and dividing by n, the limit of the right-hand side 
above is determined by the smallest of the S,QCB{Pi, Pj), which according 
to (2.2) coincides with ^qcb(S). The theorem follows. □ 

5. Commuting states. Suppose all the density matrices pi are commut- 
ing: pipj = pjpi for all i,j G [1,?*]. Then the pi have a common set of eigen- 
vectors Vj, j G [l,d]. The spectral decompositions (3.2) now are 

d 

Pi = ^K,j\vj){vj\, iG[l,r]. 
i=i 

Also, w.l.o.g., by applying a unitary transformation, we can assume that 
all Pi are diagonal matrices and Vj is a canonical basis vector of C^. Then 
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the set of eigenvalues of pi represents a probability distribution Pj on a finite 
sample space ^Vi = d, where each a; € O can be identified with one of the 
projections \vj ) {vj \ . 

With this identification, Algorithm 2 reduces essentially to Algorithm 1. 
Indeed, in the orthogonalization step (3.3), the newly appearing unit vec- 
tor Vi*j* in step s is one of the basis vectors Vj. By induction, it follows 
that the constructed basis ei, . . . , coincides with vi, . . . ,Vd up to possible 
reindexing and change of sign. Thus, the classical decision rule ip found in 
Algorithm 2 on the sample space elements \ej){ej\ is equivalent to a de- 
cision rule on Q, constructed according to Algorithm 1, and the latter is 
a maximum likelihood rule. The ML rule is not unique in general; in case of 
nonuniqueness, any version may result from Algorithm 1, according to the 
choice of a maximal entry in step (i). 

In Lemma 1, is the Gram matrix pertaining to {vj}j^^, that is unity. 
Thus, we obtain 

Err(^) < r-^ V inf tT[pj~'p'] 

l<i,j<r,jr^i 

and reasoning further as in (4.10) and (4.11 ), we have thus reproduced the 
attainability result for the multiple classical Chernoff bound (cf. (1.2) and 
[28, 29]). 

6. A near optimal rate in the general case. We establish that, as stated 
in Theorem 3, in the general case of a finite number of quantum hypothe- 
ses there exist quantum tests that achieve an error exponent equal to the 
generalized quantum Chernoff distance up to a factor 1/3. 

To construct the detector attaining the exponential bound in the general 
case, we will modify Algorithm 2 such that it assumes certain density matri- 
ces Pi, which represent e-perturbations of embeddings of the original pi into 
a higher-dimensional space C^, D > d. These states pi are not observable; 
the detector will be applied to the extensions of which are observable. 

Set D = (r + l)d and consider the kih. canonical unit vector in {r + l)d- 
dimensional space C^. Reindex the basis vectors fk such that /jj = 
for (i,j) € [l,r + 1] x [l,(i] and define subspaces 

5i = span{/ij}^=i. 

Then is a direct sum = ©^=1 Si where all Si are isomorphic to C^. 
Let the operator F represent the canonical embedding F '.C^ ^ Si. Re- 
call the spectral representation (3.2) of pi with eigenvectors Vij G C^; set- 
ting Ujj- = Fvij, we may equivalently assume that instead of pi we measure 
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a D X D density matrix po,^ having spectral representation 

d 

PO,i = y^i,j\Ui,j){uij\. 
i=l 

For e e (0, 1) and 5^ = (1 — e^)^/^, define vectors 

Uij := SeUij + e/i+i J 

for € J= [l,r] x [l,d]. Then, since (tijj = 0, the vectors Ujj are 
unit vectors; define density matrices 

d 

(6.1) pi = yX^J\uiJ){uiJ\, ie[l,r]. 

1=1 

Relative to this set of density matrices on C^, satisfying 

(6.2) tr[p]-^p^^]=5ttr[pl"^p^^] 

construct a detector according to (3.1) and Algorithm 2, and call this E^. 
Then each E^^i is a projection matrix in and Yll=i^e,i = Id- Define 
now E^^i as the upper dx d submatrix of E^^i. Then E^^i is a positive matrix 
and Yll=i Ee,i = Id, so that 

(6.3) E, := 

constitutes a POVM in C^. 

It should be noted that E^^i are not projections, that is, E^^ is a general 
POVM but not a PVM, contrary to the detector constructed in Algorithm 2. 
However, E^ results from a PVM E^ in a higher-dimensional space by taking 
submatrices. This relationship holds between POVMs and PVMs in general, 
on the basis of Naimark's theorem; cf. Parthasarathy [25] for a discussion. 

Lemma 4. Let S = {pi}l^i be an arbitrary set of density matrices on C^. 
For sufficiently small e > 0, the detector E^ constructed in (6.3) fulfills 

(6.4) Err(^£) < ( 2e + ^ inf tr[p,^-V|])• 
Proof. Consider the Gram matrix Tj of the set of vectors {ttjj, (i, j) € J}. 

Since for (i, j) € J and (fc, I) & J we have 

it follows that Tj is a convex combination of two Gram matrices, which 
implies that 

Amin(fj)>e^ 
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Hence, {uij,{i,j) € J} is a set of rd linearly independent vectors in C^. 
Since Algorithm 2 eliminates from all eigenvectors pertaining to zero 

eigenvalues, the sequence of length L contains exactly the vectors 

{uij,{i,j) € J} pertaining to nonzero Ajj in (6.1). Their full Gram ma- 
trix r^, as given by (3.5) for s = L is a submatrix of Tj (after rearrangement) 
and hence also fulfills 

(6.5) \min{rL)>e^. 

Consider the error probability of the POVM 



i=l 

(6.6) 



r 



= 1 - r-^^ti[E,^iPi] + r~^^tT[E,4pi - po,*)]- 

i=l 1=1 

Now according to Lemma 1, (6.5), and (6.2) we have 

r 

<e-V-i V inf tr[pl-'p'A. 

\<i<j<r ^ ^ 

For the second term on the right-hand side of (6.6) note that 

d 

Pi - Po.i = EAjj(|Mjj)(nij| - |Mij)(ujj|). 
i=i 

Here we have 

= \^e'U'i^i +£fi+l,j){SeUi,j + ^fi+l,j \ - 
= -e^\Uij){uij\ 

+ Se£\Uij){fi+ij\ +6ee\fi+ij){Uij\ +e^|/i+ij)(/i+i,^ 
= ^e^lUiJ + fi+lj){Uij + fi+l,j\ 

- {5ee - e^){\uij){ui^j\ + |/i+i,i)(/i+ijl) 

- 2£'^\Uij){Uij\. 

Since the matrix 

{5ee- e^){\uij){uij\ + \fi+ij){fi+ij\) +2e^\uij){uij\ 
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is positive for sufficiently small e, we have 

\ui,j){uij\ - \uij){uij\ < 6£e\uij + fi+ij){uij + /j+ijl 
consequently 



tr[^e,j(pj - Po,i)] < ^ Xij tliEe,i{6ee\Uij + fi+l,j) {Uij + fi+l,j\ 
d 



d 

Proof of Theorem 3. We denote the factor of in (6.4) by Ki, 
and in the n-fold tensor product case, where pi is replaced by pf", by K^, 
respectively. To find the best upper bound in (6.4), we minimize the expres- 
sion 2e + e~'^ Kn in e. The solution is e = Kn^^ and the value at the minimum 
is 3K}/^ . Since Kn tends to zero as n goes to infinity it is ensured that for 

1/3 

sufficently large n, the value Kn is small enough to satisfy the condition 
of Lemma 4. Thus from (6.4) we obtain 

Err(ii;W) < 3r-'( Yl mf tr[(pr )i-(pf )l) 

where E^^^ denotes the respective detectors in the tensor product case S®". 
It follows 

ilogErr(i?W) < iilogf ^ inf tr[(pf" )i"^(pf )l) + o(l) 
= ^logeQCB(S)+o(l), 

which proves our claim. □ 
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